## # A tibble: 19 x 2
##      Q29     n
##    <dbl> <int>
##  1    13   915
##  2    NA   771
##  3     5   745
##  4     4   636
##  5     8   351
##  6     3   180
##  7    11   173
##  8    16   151
##  9    10   128
## 10     7   119
## 11    17   114
## 12     1    84
## 13    12    71
## 14    18    69
## 15     6    55
## 16     2    21
## 17    14    13
## 18     9     6
## 19    15     3
## # A tibble: 5 x 2
##     Q32     n
##   <dbl> <int>
## 1     1  1038
## 2     2  1122
## 3     3  1121
## 4     4   702
## 5    NA   622
## # A tibble: 5 x 2
##     Q32     n
##   <dbl> <int>
## 1     1    72
## 2     2    81
## 3     3    80
## 4     4    75
## 5    NA   463
## # A tibble: 13 x 2
##      Q33     n
##    <dbl> <int>
##  1     1  1004
##  2     2    63
##  3     3   143
##  4     4   904
##  5     5    60
##  6     6    68
##  7     7    66
##  8     8   252
##  9     9   419
## 10    10   420
## 11    11   224
## 12    12   388
## 13    NA   594
## # A tibble: 13 x 2
##      Q33     n
##    <dbl> <int>
##  1     1   947
##  2     4   835
##  3    10   399
##  4     9   382
##  5    12   351
##  6     8   237
##  7    11   199
##  8    NA   133
##  9     3   119
## 10     6    62
## 11     5    57
## 12     7    57
## 13     2    56
## # A tibble: 6 x 2
##     Q37     n
##   <dbl> <int>
## 1     1  2985
## 2     2  1016
## 3     3    29
## 4     4    44
## 5  1234     1
## 6    NA   530

Drop NAs for specific questions and filter out disciplines with fewer than 30 (the cutoff) students in sample

## # A tibble: 18 x 2
##      Q29     n
##    <dbl> <int>
##  1    13   644
##  2     5   540
##  3     4   443
##  4     8   243
##  5    16   120
##  6     3   111
##  7    11   111
##  8     7    95
##  9    10    95
## 10    17    81
## 11     1    70
## 12    12    57
## 13    18    42
## 14     6    35
## 15     2    16
## 16    14     8
## 17     9     3
## 18    15     1
## # A tibble: 14 x 2
##      Q29     n
##    <dbl> <int>
##  1    13   644
##  2     5   540
##  3     4   443
##  4     8   243
##  5    16   120
##  6     3   111
##  7    11   111
##  8     7    95
##  9    10    95
## 10    17    81
## 11     1    70
## 12    12    57
## 13    18    42
## 14     6    35

Major counts and percentages

Q29 major n pct_total cumulat_pct
13 Mec 644 23.97 23.97
5 Che 540 20.10 44.06
4 Civ 443 16.49 60.55
8 Ele 243 9.04 69.59
16 Softw 120 4.47 74.06
3 Bio 111 4.13 78.19
11 Ind 111 4.13 82.32
7 Comp 95 3.54 85.86
10 Env/Eco 95 3.54 89.39
17 Str/Arc 81 3.01 92.41
1 Aer/Oce 70 2.61 95.01
12 Mat 57 2.12 97.13
18 Gen 42 1.56 98.70
6 Con 35 1.30 100.00

Gender counts overall

Q37 n pct_total
1 1973 73.43
2 678 25.23
3 15 0.56
4 21 0.78

fill in 0s for NAs for specific items (Q1, Q3, Q5)

Drop majors with low counts (below 30 students in sample)

## # A tibble: 14 x 2
##      Q29     n
##    <dbl> <int>
##  1    13   618
##  2     5   523
##  3     4   433
##  4     8   240
##  5    16   118
##  6     3   110
##  7    11   108
##  8     7    93
##  9    10    90
## 10    17    78
## 11     1    70
## 12    12    55
## 13    18    40
## 14     6    32
## # A tibble: 14 x 2
##    major       n
##    <chr>   <int>
##  1 Mec       618
##  2 Che       523
##  3 Civ       433
##  4 Ele       240
##  5 Softw     118
##  6 Bio       110
##  7 Ind       108
##  8 Comp       93
##  9 Env/Eco    90
## 10 Str/Arc    78
## 11 Aer/Oce    70
## 12 Mat        55
## 13 Gen        40
## 14 Con        32

Drop majors with NA as major

## # A tibble: 14 x 2
##    major       n
##    <chr>   <int>
##  1 Mec       618
##  2 Che       523
##  3 Civ       433
##  4 Ele       240
##  5 Softw     118
##  6 Bio       110
##  7 Ind       108
##  8 Comp       93
##  9 Env/Eco    90
## 10 Str/Arc    78
## 11 Aer/Oce    70
## 12 Mat        55
## 13 Gen        40
## 14 Con        32

Clustering Process (two-step process using UMAP + HDBSCAN)

First perform dimension reduction using UMAP

## NULL
##           [,1]      [,2]
## [1,]  15.62332 -15.97910
## [2,]  11.12283 -18.44035
## [3,] -34.20802  18.18004
## [1]  15.623323  11.122833 -34.208024 -34.419584   6.551877 -34.582958
## [1] -15.97910 -18.44035  18.18004  18.60377 -23.96864  18.41222

Next, perform clustering with HDBSCAN

## HDBSCAN clustering for 2608 objects.
## Parameters: minPts = 120
## The clustering contains 6 cluster(s) and 241 noise points.
## 
##    0    1    2    3    4    5    6 
##  241 1080  575  175  147  264  126 
## 
## Available fields: cluster, minPts, cluster_scores, membership_prob,
##                   outlier_scores, hc

Join the dataframes back together again

## # A tibble: 7 x 2
##   cluster     n
##     <dbl> <int>
## 1       1  1080
## 2       2   575
## 3       5   264
## 4       0   241
## 5       3   175
## 6       4   147
## 7       6   126

Views of when climate change will affect different groups broken down by cluster assignments

Looking for patterns in the clusters

Understanding cluster compositions more

First create a dataframe with the rankings for each cluster, where a lower ranking means students in the cluster think climate change will affect more categories sooner.

## # A tibble: 7 x 3
##   cluster_time_rank cluster cluster_avg
##               <int>   <dbl>       <dbl>
## 1                 1       1       0.386
## 2                 2       4       3.46 
## 3                 3       0       4.76 
## 4                 4       5       5.14 
## 5                 5       6       7.13 
## 6                 6       2       9.24 
## 7                 7       3      17.4

Set clustering colors for all plots

Look at distribution of cluster for how when they think each community may be affected by global warming

Same information but faceted with clusters

** This is a good plot for seeing that a cluster’s beliefs about effects of global warming on different populations at different times vary in a clear pattern

Overall count for clusters

Broken plot

Career goals analysis

Kruskal wallis tests of whether different clusters (students with different temporal discounting patterns of when climate change will affect various groups) consider various career satisfaction factors important

Career satisfaction items (Q4 items)

## # A tibble: 41,728 x 5
##    student_id major cluster_time_rank Q4_item Q4_resp
##         <int> <chr>             <int> <chr>     <dbl>
##  1          1 Ele                   5 Q4a           4
##  2          1 Ele                   5 Q4b           3
##  3          1 Ele                   5 Q4c           3
##  4          1 Ele                   5 Q4d           2
##  5          1 Ele                   5 Q4e           4
##  6          1 Ele                   5 Q4f           2
##  7          1 Ele                   5 Q4g           2
##  8          1 Ele                   5 Q4h           4
##  9          1 Ele                   5 Q4i           4
## 10          1 Ele                   5 Q4j           1
## # ... with 41,718 more rows

Table of chi square test results

Q4_item Q4_item_name statistic p.value parameter method
Q4a Make money 27.77257 0.2697493 24 Pearson’s Chi-squared test
Q4b Fame 23.05203 0.5167281 24 Pearson’s Chi-squared test
Q4c Help others 59.77527 0.0000687 24 Pearson’s Chi-squared test
Q4d Supervise others 24.54317 0.4309141 24 Pearson’s Chi-squared test
Q4e Job sec. and opp. 41.83517 0.0134640 24 Pearson’s Chi-squared test
Q4f Work w/ people 49.69039 0.0015501 24 Pearson’s Chi-squared test
Q4g Invent/design 33.19928 0.0999380 24 Pearson’s Chi-squared test
Q4h Develop knowledge/skill 30.05447 0.1829529 24 Pearson’s Chi-squared test
Q4i Personal/fam. time 30.50092 0.1686997 24 Pearson’s Chi-squared test
Q4j Easy job 72.65467 0.0000009 24 Pearson’s Chi-squared test
Q4k Exciting env. 29.78337 0.1920356 24 Pearson’s Chi-squared test
Q4l Solve societal prob. 94.89931 0.0000000 24 Pearson’s Chi-squared test
Q4m Use talent/abilities 37.59814 0.0381002 24 Pearson’s Chi-squared test
Q4n Do hands-on work 27.20240 0.2951075 24 Pearson’s Chi-squared test
Q4o Apply math/sci. 26.14879 0.3456586 24 Pearson’s Chi-squared test
Q4p Volunteer w/ charity 78.86848 0.0000001 24 Pearson’s Chi-squared test

Using kruskal-wallis instead of chi square since outcome is Likert-scale item (i.e., ordinal variable)

Q4_item Q4_item_name statistic p.value parameter method
Q4a Make money 12.819453 0.0121925 4 Kruskal-Wallis rank sum test
Q4b Fame 2.078992 0.7212328 4 Kruskal-Wallis rank sum test
Q4c Help others 25.037874 0.0000494 4 Kruskal-Wallis rank sum test
Q4d Supervise others 2.835811 0.5856670 4 Kruskal-Wallis rank sum test
Q4e Job sec. and opp. 12.838757 0.0120911 4 Kruskal-Wallis rank sum test
Q4f Work w/ people 3.993174 0.4069304 4 Kruskal-Wallis rank sum test
Q4g Invent/design 2.182131 0.7023020 4 Kruskal-Wallis rank sum test
Q4h Develop knowledge/skill 7.608165 0.1070332 4 Kruskal-Wallis rank sum test
Q4i Personal/fam. time 3.142947 0.5341950 4 Kruskal-Wallis rank sum test
Q4j Easy job 5.616663 0.2296635 4 Kruskal-Wallis rank sum test
Q4k Exciting env. 13.322136 0.0098045 4 Kruskal-Wallis rank sum test
Q4l Solve societal prob. 74.868555 0.0000000 4 Kruskal-Wallis rank sum test
Q4m Use talent/abilities 20.300904 0.0004355 4 Kruskal-Wallis rank sum test
Q4n Do hands-on work 5.069584 0.2802320 4 Kruskal-Wallis rank sum test
Q4o Apply math/sci. 1.734258 0.7844857 4 Kruskal-Wallis rank sum test
Q4p Volunteer w/ charity 27.906034 0.0000130 4 Kruskal-Wallis rank sum test

Q5 career topic interests for all ten topics in Q5 by cluster (instead of major)

Table of average number of topics that students in each cluster identified

## # A tibble: 7 x 3
##   cluster_time_rank     n avg_Q5_total
##               <int> <int>        <dbl>
## 1                 1  1080         3.68
## 2                 5   126         3.21
## 3                 2   147         3.18
## 4                 4   264         2.98
## 5                 3   241         2.97
## 6                 6   575         2.71
## 7                 7   175         2.06

Number of Q5 topics identified in each cluster

Proportions of each cluster interested in Q5 topics

Interests within each cluster

Q5_item Q5_item_name statistic p.value parameter method
Q5a Energy (supply/demand) 13.803697 0.0319075 6 Pearson’s Chi-squared test
Q5b Disease 7.383852 0.2868020 6 Pearson’s Chi-squared test
Q5c Poverty and wealth dist. 52.194355 0.0000000 6 Pearson’s Chi-squared test
Q5d Climate change 203.090564 0.0000000 6 Pearson’s Chi-squared test
Q5e Terrorism and war 9.742207 0.1359365 6 Pearson’s Chi-squared test
Q5f Water supply 27.878110 0.0000991 6 Pearson’s Chi-squared test
Q5g Food availability 30.949248 0.0000259 6 Pearson’s Chi-squared test
Q5h Opp. for future gen 15.781208 0.0149778 6 Pearson’s Chi-squared test
Q5i Opp. for women and/or min. 85.960121 0.0000000 6 Pearson’s Chi-squared test
Q5j Environmental degradation 117.482450 0.0000000 6 Pearson’s Chi-squared test

End of career interest section

End cluster analysis